Orchestrating chromosome conformation capture analysis with Bioconductor

Genome-wide chromatin conformation capture assays provide formidable insights into the spatial organization of genomes. However, due to the complexity of the data structure, their integration in multi-omics workflows remains challenging. We present data structures, computational methods and visualization tools available in Bioconductor to investigate Hi-C, micro-C and other 3C-related data, in R. An online book (https://bioconductor.org/books/OHCA/) further provides prospective end users with a number of workflows to process, import, analyze and visualize any type of chromosome conformation capture data.


REVIEWER COMMENTS
Reviewer #1 (Remarks to the Author): The manuscript "Orchestrating chromosome conformation capture analysis with Bioconductor" describes Bioconductor packages, data structures, visualization capabilities, and workflow/tutorial-like case scenarios of various analyses of Hi-C data.In essense, it is a detailed vignette of using the HiCExperiment, HiContacts, HiCool, fourDNData, DNAZooData developed by the authors.The manuscript is short but the online version is extensive and detailed.
-The book introduces many external packages and pipelines.While generally comprehensive, such references are incomplete.For example, when describing the general pipelines for Hi-C data processing, HiCExplorer, HiCUP, FanC analysis pipelines are not mentioned.The GENOVA, Mariner R packages with the similar analysis and visualization functionality are not mentioned.It is not required, and perhaps not possible to refer to all of these, but it may be beneficial to add a reference to more general lists of Hi-C software, such as https://github.com/mdozmorov/HiC_tools,which is also incomplete but covers a broad spectrum of Hi-C tools.
-The rendering of some references is inconsistent.E.g., "J.O. et al. ( 2017)", while the more readable one would be "Davies J.O. et al. ( 2027)".Most references also lack initials, e.g., "Durand et al. ( 2016)".Also, each page has an empty "References" header at the bottom, but no references until the last page.It is a minor annoyance, OK if impossible to correct in Quarto.
-In the sentence "More information about the conventions related to this text file are provided by the 4DN consortium", suggesting moving the link from "4DN consortium" to "More information" because that's what the link is referring to.
-There are many ideas how this project can be developed.The Discussion is somewhat short and lacks future directions description.Adding them should be beneficial.
Reviewer #2 (Remarks to the Author): Orchestrating chromosome conformation capture analysis with Bioconductor by J. Serizay and colleagues describe their R code to process and analyse Hi-C data.
The approach developed by the authors combines state of the art approaches for an efficient preprocessing, handling and more high level analysis of chromatin conformation data (limited by R and other software deployed by the functions).
Although the authors do not present novel approaches that would change the approach towards HiC data analysis, I am very much positive about the work presented in this paper.
First of all, it is essential to provide the community with tools that are easily implemented and shared across labs.There are several "pipelines" in the field (the authors take advantage of them).Yet, none of these tools allows to put the analysis so nicely in one framework, easy to implement and (potentially) modify by others.Using R/Bioconductor seems like a perfect way to address this need.
Second, the approaches the authors use seem pretty complete, from preprocessing to the analysis of compartments and domains.The visualisation functions are particularly nice and versatile (saddle plots, virtual 4C).
Third, despite lack of a fundamentally novel approach, the authors introduce a new way of storing HiC data in R which is a great plus as it will allow to intersect datasets more easily.Likewise, I like the infrastructure to query the data available from the 4DN.
Having said all that, it would be important to generate more information that will be useful for the users.The most recent approaches in studying chromatin conformation introduce a dramatic increase in Hi-C resolution.The authors should address the hardware requirements to manipulate data at the resolution of single nucleosomes.
It would perhaps be also interesting to implement sub-compartment calling.
Reviewer #3 (Remarks to the Author): In this work, Serizay and colleagues describe a Bioconductor-based workflow for end-to-end analysis of chromatin conformation capture (mostly Hi-C) data.They implement data structures to represent Hi-C datasets and genomic contacts on file, as well as methods to perform common manipulations, calculations and visualization.The manuscript is concise and well-written, and the accompanying book is comprehensive and user-friendly.This is likely to be a great resource for the Bioconductor community and I don't have any major concerns about it.Nonetheless, I will throw in some comments so as to prove that I did do my job as a reviewer.

MINOR:
-Be careful about the name of the "HiCExperiment" class.This suggests that it is a SummarizedExperiment subclass (e.g., SingleCellExperiment, SpatialExperiment)... which it isn't.Users may think that it is possible to use SummarizedExperiment methods like assay() and colData(), and be surprised when those fail.Perhaps it may be too late to change it at this point, but I will just say this can become a source of regret later on; for example, the naming of the "InteractionSet" is quite unfortunate, because it is actually an SummarizedExperiment subclass but the naming suggests otherwise.
-Perhaps the authors could explore the use of DelayedArrays to represent contact matrices directly from the various Hi-C file formats, especially those based on HDF5.This would provide a matrix-like API that extracts data on demand, e.g., users could intuitively subset with the usual "[" operators before pulling out the subset of data that they're interested in.
-I am sure that this is already being considered, but it would be worthwhile deploying the book as part of the Bioconductor book releases, e.g., https://bioconductor.org/books/release/.This checks that the book runs correctly with the latest versions of all packages, and the Bioconductor domain name is also a bit more official than a random-looking GH pages site.

VERY MINOR:
-A couple of references to basilisk, but this is missing a actual citation AFAICT.
-The book has lots of notes and reminders in alert boxes.I find these mildly distracting at their current frequency.
-diffHic is another (maybe the oldest?)package for differential Hi-C analyses, that you could mention in Chapter 9.As long as you can transform your data to an InteractionSet, it should be good to go.
We would like to sincerely thank the three reviewers for their review of the manuscript.We have made all efforts to systematically address each comment by updating the manuscript and source code accordingly.Here we provide a point-by-point response to each reviewer.

Reviewer #1 (Remarks to the Author):
The manuscript "Orchestrating chromosome conformation capture analysis with Bioconductor" describes Bioconductor packages, data structures, visualization capabilities, and workflow/tutorial-like case scenarios of various analyses of Hi-C data.In essense, it is a detailed vignette of using the HiCExperiment, HiContacts, HiCool, fourDNData, DNAZooData developed by the authors.The manuscript is short but the online version is extensive and detailed.
We thank Reviewer #1 for their careful reading of our manuscript and our supporting online book.
-The book introduces many external packages and pipelines.While generally comprehensive, such references are incomplete.For example, when describing the general pipelines for Hi-C data processing, HiCExplorer, HiCUP, FanC analysis pipelines are not mentioned.The GENOVA, Mariner R packages with the similar analysis and visualization functionality are not mentioned.It is not required, and perhaps not possible to refer to all of these, but it may be beneficial to add a reference to more general lists of Hi-C software, such as https://github.com/mdozmorov/HiC_tools,which is also incomplete but covers a broad spectrum of Hi-C tools.
Orchestrating chromosome conformation capture analysis with Bioconductor by J. Serizay and colleagues describe their R code to process and analyse Hi-C data.
The approach developed by the authors combines state of the art approaches for an efficient preprocessing, handling and more high level analysis of chromatin conformation data (limited by R and other software deployed by the functions).
Although the authors do not present novel approaches that would change the approach towards HiC data analysis, I am very much positive about the work presented in this paper.
First of all, it is essential to provide the community with tools that are easily implemented and shared across labs.There are several "pipelines" in the field (the authors take advantage of them).Yet, none of these tools allows to put the analysis so nicely in one framework, easy to implement and (potentially) modify by others.Using R/Bioconductor seems like a perfect way to address this need.
Second, the approaches the authors use seem pretty complete, from preprocessing to the analysis of compartments and domains.The visualisation functions are particularly nice and versatile (saddle plots, virtual 4C).
Third, despite lack of a fundamentally novel approach, the authors introduce a new way of storing HiC data in R which is a great plus as it will allow to intersect datasets more easily.Likewise, I like the infrastructure to query the data available from the 4DN.
We thank Reviewer #2 for their appreciation of our work and their helpful comments.
Having said all that, it would be important to generate more information that will be useful for the users.The most recent approaches in studying chromatin conformation introduce a dramatic increase in Hi-C resolution.The authors should address the hardware requirements to manipulate data at the resolution of single nucleosomes.This is a point we did not address in the original manuscript, and following the reviewer suggestions, we added a few sentences in the "Data representation" section of our revised manuscript to discuss the hardware required to manipulate chromosome conformation capture data at increasing resolutions.
"Thanks to ever-decreasing sequencing costs and improving technology, the average size of chromosome conformation capture datasets is continuously increasing, both in sequencing depth and in resolution.HDF5-derived `.(m)cool`and binary `.hic`files both efficiently store such large-scale data, and HiCExperiment objects instantized from these file formats benefit from efficient parsing libraries based on C code, optimized for speed.Furthermore, because random access is supported for these file formats, contact matrices can be partially imported in R, allowing manipulation of large datasets -such as deeply sequenced micro-C datasets -even on personal laptops with standard hardware configuration (e.g. 4 CPUs and 8-16Gb RAM)." It would perhaps be also interesting to implement sub-compartment calling.
Several libraries (available in R or from other programming languages) already provide these specific functionalities and are already widely used.For this reason, we believe that (re)implementing these functionalities in yet another library would not positively contribute to improving Hi-C analysis overall.Instead, we added a sentence to mention existing libraries dedicated to sub-compartment calling in R.
"It is nonetheless advised to investigate structural features using a range of different methods.For instance, A/B compartments can be identified in R with HiTC and HiCDOC packages, while finer sub-compartments can be annotated using CALDER.To allow end-users to use best-suited existing R packages, HiCExperiment objects can be coerced into the specific data structures such as matrices, data frames or GInteractions."